Curse of Dimensionality in the Application of Pivot-based Indexes to the Similarity Search Problem
نویسنده
چکیده
In this work we study the validity of the so-called curse of dimensionality for indexing of databases for similarity search. We perform an asymptotic analysis, with a test model based on a sequence of metric spaces (Ωd) from which we pick datasets Xd in an i.i.d. fashion. We call the subscript d the dimension of the space Ωd (e.g. for R d the dimension is just the usual one) and we allow the size of the dataset n = nd to be such that d is superlogarithmic but subpolynomial in n. We study the asymptotic performance of pivot-based indexing schemes where the number of pivots is o(n/d). We pick the relatively simple cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the spaces Ωd exhibit the (fairly common) concentration of measure phenomenon the performance of similarity search using such indexes is asymptotically linear in n. That is for large enough d the difference between using such an index and performing a search without an index at all is negligeable. Thus we confirm the curse of dimensionality in this setting.
منابع مشابه
Overcoming the Curse of Dimensionality ?
We study the behavior of pivot-based algorithms for similarity searching in metric spaces. We show that they are eeective tools for intrinsically high-dimensional spaces, and that their performance is basically dependent on the number of pivots used and the precision used to store the distances. In this paper we give a simple yet eeective recipe for practitioners seeking for a black-box method ...
متن کاملFractal Compression Using the Discrete Karhunen-Loeve Transform
Fractal coding of images is a quite recent and eecient method whose major drawback is the very slow compression phase, due to a time-consuming similarity search between image blocks. A general acceleration method based on feature vectors is described, of which we can nd many instances in the litterature. This general method is then optimized using the well-known Karhunen-Loeve expansion, allowi...
متن کاملNear neighbor searching with K nearest references
Proximity searching is the problem of retrieving, from a given database, those objects closest to a query. To avoid exhaustive searching, data structures called indexes are built on the database prior to serving queries. The curse of dimensionality is a well-known problem for indexes: in spaces with sufficiently concentrated distance histograms, no index outperforms an exhaustive scan of the da...
متن کاملPhysical Database Design for Efficient Time-Series Similarity Search
Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first tw...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0905.2141 شماره
صفحات -
تاریخ انتشار 2009